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For large clause-to-variable ratio, typical if-SAT instances drawn from the uniform distribution 
have no solution. We argue, based on statistical mechanics calculations using the replica and 
cavity methods, that rare satisfiable instances from the uniform distribution are very similar to 
", typical instances drawn from the so-called planted distribution, where instances are chosen uniformly 

■ between the ones that admit a given solution. It then follows, from a recent article by Feige, 

' Mossel and Vilenchik, that these rare instances can be easily recognized (in 0(log A'^) time and with 

, probability close to 1) by a simple message-passing algorithm. 
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I. INTRODUCTION 



A lot of efforts have recently been devoted to the investigation of the computational complexity of hard compu- 
tational problems under input model distributions. One popular case is the ^-Satisfiability (iiT-SAT) problem with 
1—1 . uniform distribution where clauses are picked up uniformly at random from the set of iiT-clauses over N Boolean vari- 
ables [1]. It is widely believed that there exists a phase transition when the number of clauses, M, and of variables, 
, TV, go to infinity at fixed ratio a. Instances with ratio a smaller than some critical value ac{K) typically admit a 
solution, while instances with ratio a > ac{K) are almost surely not satisfiable. Rigorous studies combined with 
^ , statistical physics methods have produced bounds and estimates to the value of ac{K) and conjectured the existence 
of a rich structure of the space of solutions in the satisfiable phase [2-8] . 
I If those results are certainly interesting from the random graph theory point of view their relevance to computer 
i science is a matter of debate. The major concern is that they are highly specific to one particular distribution of 
instances, with no obvious theoretical generality or usefulness for practical applications where instances are highly 
structured or extracted from unknown distributions. Recently, however, a strong motivation to the study of random 
if-SAT in computer science was pointed out by Feige [9]. Under the assumption that 3-SAT is hard on average under 
uniform distribution Feige proved some worst-case hardness of approximation results for many different problems 
such as min bisection, dense iiT-subgraph, max bipartite clique, etc... The average-case hardness hypothesis can be 
informally stated as : there is no fast algorithm capable of recognizing every satisfiable instance and most unsatisfiable 
instances for arbitrarily large (but bounded when N —>■ 00) ratio a. 
^ In the present work we point out that a similar but stronger hypothesis, with every replaced by with probability p, is 

• • , wrong whatever p < 1. For large a (well above ac) typical instances of the uniform distribution have no solution. We 
argue, based on statistical mechanics calculations using the replica and cavity methods, that rare satisfiable instances 
from the uniform distribution are very similar to typical instances drawn from the so-called planted distribution, 
; ^ ■ where instances are chosen uniformly between the ones that admit a given solution. Our result then follows from a 
Ci I recent article by Feige, Mossel and Vilenchik who showed that, for iiT-SAT with the planted distribution, a simple 
message-passing algorithm is able to find the solution with probability 1 — e"'^^") in polynomial time [10]. 



II. DEFINITIONS 



We consider random i^T-SAT instances (formulas) with N Boolean variables and M — aN clauses. A clause has 
the form Ci{X) = yi-^ \/ yi^ V ■ ■ ■ V yi^, where y G {x, x} represents the variable or its negation; the if-SAT problem 
consists in finding an assignment X such that /\jCj{X) =TRUE. Sometimes we will specialize to the case = 3 for 
simplicity. We will consider the following distributions over the formulas F [10]: 

• the uniform distribution Vuni / [F] over all possible formulas with N variables and M — aN clauses made of K 
literals (variables or their negations) corresponding to different variables; 

• the distribution Vsat[F] obtained from the distribution above by conditioning to satisfiability. In other words, 
"PsatlF] gives uniform probability to all satisfiable formulas and zero to the others. 
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• the planted distribution Vpiant[F] which is constructed as foUows: first one extracts with uniform probabiHty 
one configuration X of the N variables, and then extracts with uniform probabihty one formula among the ones 
that admit X as a solution. Non-uniform variants of Vpiant were studied in [12]. 

The number of formulas that have a given solution X is independent of X for symmetry reasons, TV/ [X] = Aff = 
[(^) (2^ - 1)]^^. Define x[F; X] = 1 when X is a solution to F and x[F; X] = otherwise, and Afs[F] the number of 
solutions to F. We have 



thus Vpiant is not uniform over the satisfiable instances, but is proportional to the number of solutions to a given 
formula. 

In [9] the following hardness hypothesis was introduced for formulas drawn from the uniform distribution Vunif, 

Hypothesis 1 : Even if a is arbitrarily large (but independent of N ), there is no polynomial time algorithm that 
on most 3- SAT formulas outputs UNSAT, and always outputs SAT on a 3- SAT formula that is satisfiable. 

and used to derive hardness of approximation results for various computational problems. A stronger form of hy- 
pothesis 1 is obtained by replacing never with with probability p (with respect to the uniform distribution over the 
formulas and possibly to some randomness built in the algorithm): 

Hypothesis 1^: Even if a is arbitrarily large (but independent of N), there is no polynomial time algorithm that 
on most 3-SAT formulas outputs UNSAT, and outputs SAT with probability p on a 3- SAT formula that is satisfiable. 

We want to present some arguments supporting the idea that Ip is false for any p < 1. Indeed, in [10] it has 
been shown that, if the formulas are drawn with probability Vpiant, then a solution is foimd in polynomial time with 
probability 1 — e^*^*-"^ by a message-passing algorithm, called Warning Propagation (WP). WP is a simplified version 
of the zero-temperature Belief Propagation procedure, sec [10, 11] for a presentation. It is important to notice that 
WP is a constructive algorithm: when it declares a formula to be satisfiable, it provides a solution. This means that it 
never outputs SAT on a formula which is unsatisfiable. On the other hand, if the algorithm has not found a solution 
after a given number of iterations (which depends on TV, see below), we declare the output to be UNSAT. 

It is natural (and was already suggested in [10]) to try to extend this result to formulas drawn from the distribution 
Vsat- The main ingredients that are needed in the proof of [10] are the following: 

1. at large a, formulas drawn from Vpiant[F] typically have a single cluster oi solutions with a large core: namely, 
there is a set H (the core) containing a fraction 1 — e""^^"' of variables that have the same value in all the 
solutions of a given formula drawn from Vpiant[F]; 

2. the cavity fields (or variable-to-clause messages) corresponding to the core variables, defined roughly as the 
number of clauses that are violated if one takes a solution to F and changes the value of a given core variable. 



3. the cavity fields for the core variables are 0{a) even if they are computed with respect to a random configuration 
(see [10] for a precise definition); this is a conscqTience of the fact that if a variable Xi has value 1 in the solutions 
to F, then the probability of this variable appearing as Xi in a clause (according to Vpiant) is bigger than the 
probability of it appearing as 5, (and viceversa if the variable is in the solutions). 

We claim that formulas drawn from Vsat are very similar to the ones drawn from Vpiant, and in particular properties 
1, 2 and 3 hold for them. Moreover we will show that the relative entropy (Kullback-Leibler divergence) of Vpiant 
with respect to Vsat is 0(TVe^"). In particular property 1 implies that properties of the formulas drawn from Vsat 
(such as the distribution of the cavity fields) can be computed in a replica, symmetric framework. We will indeed show 
that this is the case as the replica symmetric solution is stable for large a if one restricts to SAT formulas. In this 
way we will compute the distribution of the cavity fields in a solution to show that: i) only a fraction e"*^^"-* of the 
fields arc zero (corresponding to non-core variables); ii) the non- vanishing cavity fields arc typically 0{a)\ Hi) if the 
field corresponding to a variable x is (say) positive, then the number of clauses where the literal x appears is bigger 
than the number of clauses where x appears, the difference being 0{a). 

The validity of properties 1-3 together with the fact that the relative entropy of Vsat and Vpiant is small strongly 
suggest that the analysis of [10] can be extended to Vsat- Then WP will be efficient in finding solutions for satisfiable 
formulas in polynomial time, with probability close to 1 for large a, thus contradicting hypothesis Ip (but not 
hypothesis 1). 




(1) 



are 0(a); 



3 



III. STATISTICAL PHYSICS ANALYSIS OF Vsat 

A. From Vunif to Vaat- the replica calculation 

We want to compute properties of the satisfiable formulas drawn from the uniform distribution VsatW] using the 
repUca method. Following [5] we introduce a cost function 

M 

E[X] = Y,5[Ci{X)-mmn, (2) 

i=l 

where the function (5[Ci(X); FALSE] is 1 if clause Ci is false in the assignment X and otherwise (i.e. E counts the 
number of violated clauses). The replicated and disorder-averaged (i.e. averaged over the distribution Vunif [F] of the 
formulas) partition function is 



-I3E[X] 



= boe-'3So +516-/5^^1 +•••]" (3) 



where Eq is the energy of the ground state (i.e. the minimal number of unsatisfied clauses in F) and Qq its degeneracy. 
In the limit /3 — > oo, n — > with fixed product y = n(3, defining 

P{Eo = Neo) = e^"("'')+''(^) (4) 

the distribution of the ground state energy with respect to VuniflF], we have 

Zp)^ ~ 5je-"^^o = J deo e^I'^(^°)-'' + 0(6"^) = e^^^^' (5) 

since go is independent of n and therefore disappears for n — »■ 0. The function J^iv) is defined by the saddle point 
condition ^{1/) = maxeg[uj{eo) — veo]. We will verify later that Tif) is convex (for sufficiently large a,u), so that 
and Loiea) arc the Lcgcndrc transforms of each other. The dominant contribution to the integral in (5) comes 
from formulas with ground state energy density eo given by the equation eo{v) = —dvJ^iy)- As we will see from the 
calculation of !F{y)-, for large u we have eo(i') ~ e"'', so that to have eo(i') = we have to take the limit 1/ ^ 00. 
By imposing this limit we implement the constraint cq = 0, and obtain information on Vsat[F]. Note that we cannot 
implement the exact constraint of satisfiability, Eq = 0, but only lmiM~*oo Eq/N — 0, as usual in most statistical 
mechanics computations. All our results are then affected by corrections vanishing only for — > 00. 

In a replica symmetric framework the free energy J^{i') is obtained by maximizing a functional J^[R{z),p] over a 
functional order parameter R{z) (Appendix A). This order parameter is the probability distribution of a random 
field Zi acting on each variable. The latter is the difference between the minimal number of violated clauses when the 
variable Xi is set to TRUE and the same quantity for Xi =FALSE. The distribution R{z) is determined by the saddle 
point equations SJ^[R{z),i']/SR{z) = 0, which admit a solution in which the fields z are integer valued, as expected, 

+00 

R{z)= Yl rnS{z-n). (6) 

n= — oo 

The coefficients r„ are obtained by substituting this expression into the saddle point equations and solving them 
(Appendix B). In the limit v ^ 00 the saddle-point equations give a self-consistency equation for po = liniy^oo ^0" 

_ 1 

2 i-(l^) 

while all the other coefficients are given by a Poissonian distribution 

r g(po)'"' 1 ,0, 

Pr^ = }}^^'n= 2eg(.„)_i - (8) 



4 



Using the expressions above, it is possible to show that the ground state energy vanishes exponentially for z/ — »• oo, 
e^(iy) = -d,,T{v) = Ce-"" (Appendix C). 

We shall be interested in the value of a'(eo = 0) which is related to the probability of a formula extracted from 
'Punif[F] being satisfiable; setting from (C12) eo ~ e~'^ we obtain that w(eo(i^)) = ^{i') + i^eo(i^) ^ co{0) = J^{oo) + 
0{i'e~'') and using the result for given in Appendix D, we get 



a;(0) = T{oo) = log[2e^ - 1] - 



2geg 
2eS - 1 



a log 



1 



Po 



K 



(9) 



where Q = g{po). 

To conclude this section, let us discuss the stability of the replica symmetric solution for j/ ^ oo, i.e. for satisfiable 

formulas. The eigenvalues of the stability matrix around the saddle-point arc calculated in Appendix E. We show 
that all the eigenvalues are negative for large enough a and — > oo. This implies that the replica symmetric solution 
is locally stable, but does not exclude the existence of a first order transition to a different solution. In Appendix F 
wc show that if one considers a function R{z) different from (6) that allows also non-integer values of the fields, the 
weights of the non-integer fields vanish for a > a., (for some constant as which depends on K) and one gets back 
Eq. (6). This result rules out the existence of a first-order phase transition in the replica symmetric subspacc. To 
exclude the possibility of a first order transition with replica symmetry breaking one should perform the full IRSB 
computation, that we leave for future work. 



B. Interpretation of the self-consistency equations for the Held distribution 

Self-consistency equations (7) can be found back within the cavity method. Consider a formula over iV — 1 variables, 
and add a new variable x, by connecting it to the others N — 1 variables through clauses where x enters, and i- 
clauses where x enters. We assume that are independent Poissonian variables, with probabilities 

(10, 

where a' is some constant to be determined later. The signs of the other K — 1 variables in each clause are chosen 
uniformly at random. Then the probability that a new clause constrains the value of x (the clause sends a message 
to X in WP language [10, 11]) is equal to 

/I \ K-l 

«=(^ . (II) 



as (1 — /9o)/2 is the probability that the field on an 'old' variable due to the existing formula is in contradiction with 
its sign in the new clause: e.g. ifC = a;Va;iV---V xk-i this is the probability that all the fields on .xi , ■ • • , xk-i are 
negative, so that C sends a message ("be 1") to x. Let us call m+,m- the numbers of clauses containing, respectively, 
X, X and sending messages to the new variable. These are stochastic independent variables with probabilities 

PM(m±) = V pL{i) f M ^±(1 - a)'-'"± = ("'-^g/^)"" e-"'^'^/^ . (12) 
We will also need later on the weighted distribution of m±, where the weight is the number of clauses of type ±, 

P^M\m±)= V ipLie) f M 9™* (1 -?)'-"*= PM(m±) X fm± + ^(1-9)) . (13) 
e=m± ^"^^z V ^ / 

Given m+,m_ the best value for the variable x is TRUE if m+ > m_ and FALSE otherwise. The minimal number 
of violated clauses in the formula therefore increases hy E = min(TO_,TO+). The field acting on the variable x is 
the difference between the number of violated clauses when x is TRUE and when it is FALSE, z = m+ — m_. In 
particular the formula keeps being satisfiable upon inclusion of the new clauses if m_ or m+ is equal to zero. The 
joint probability of the increase in energy and of the field reads 

oo oo 

P{E,z)= ^ pM{m+) ^ 

,inin(m_ ,m-|-) ^z,{m+—m—) * 

(14) 

mj-— m_— 
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Introducing the chemical potential v of Section III A we weight each formula with a factor exp(— z/ E). The probability 
that the new variable is subjected to a field equal to z = n is thus 

where the denominator takes into account the proper reweighting of all formulas at fixed chemical potential u. 

We now restrict to the case of satisfiable formulas v —> oo. From (14), (15) we obtain the probability of having zero 
field on X given that the formula is satisfiable, 



po = lim roif) = 



Em mm) 



1 



2e^ 



1 



(16) 



We have now to take into account the fact that we want each variable to appear in aK clauses on average. Even if i± 
are distributed according to a Poissonian with average a'K/2, the fact that when we add the clauses we must discard 
the possibilities in which the variable x receives contradictory messages makes the average number of clauses wc add 
smaller than a'K. It is easy to check, using (11) and (16), that 



E 



iPM\'m+)pM{m-) +pM(m+)p^^(m_)) 

^0,min(m_ ,m-\.) 



m-|-,m_ 



Em mm) 

Then if we want {£+ + £-) = aK we have to renormalize a' as 



a'K 



1 



Po 



K' 



a 



a = 



(i^) 



K 



(17) 



(18) 



and substituting in (16) wc get back Eq. (7) as 



^ a' Kg 



The generating function G{x) of the distribution of variable occurrences i — ij^ + can be easily computed by 
adding a weight in (12) and summing over m± with the constraint min(m+, m_) = 0; after a correct normalization 
one obtains 



G{x) = e«'^(-i)(i-«) 



2^a' Kxq/2 _ 
2(,a'Kq/2 _ I 



(19) 



that generates the difference of two Poissonians distribution with different parameters. This distribution differs from 
the normal Poissonian distribution of occurrences. However the difference is exponentially small in a for all values of 
x, indeed for large a we get (recalling that po ^ 0) 



G{x) = e"'-^(=^-i)(i-9/2) ^ p-o{a) 



(20) 



which is the generating function of a Poissonian with parameter aK, consistently with (17). The fact that the 
distribution is Poissonian implies that the cavity fields have the same distribution of the true fields; the latter has 
been obtained from the replica method in Section III A. 



IV. COMPARISON OF Vsat AND VpUnt AT LARGE RATIO a 



For large a the solution to (7) is well approximated by 



1 ^, , aK 

Po = ^^, 7-^(0) = ^. (21) 



The distribution of the fields becomes 
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l"l 1 



This result implies that the solutions of formulas extracted from Vsat\P] are very similar to each other. They differ 
by a fraction e"'^^"^ of the variables only (this is in fact the weight ro of the field ^ = 0). The remaining fields z 
have typically values 0(7) = 0{a), so that the variables in the core (with the same assignments in all the solutions) 
have strong cavity fields pointing to their correct assignments. Moreover there is a correlation between cavity fields 
(or equivalently, between values of the variable in the solutions) and occurrence of the variables in the formula, as 
discussed in the next section. We will in addition show that the results (22), coincide with the ones obtained from the 
planted distribution, thus indicating that the two distributions coincide with errors e"'^^"); indeed we will compute 
the relative (extensive) entropy of the distribution and show that it is of the order of Ne~'^^°'^ . 
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A. Distribution of the fields 



To show that the Poissonian distribution (22) is the same distribution that is obtained from the planted distribution 
recall that p„ is the probability of violating \n\ clauses when a variable is flipped from the correct value it has in 
the ground state to the opposite value. The sign of n is positive if Xi =TRUE in the solution and negative otherwise. 

The planted distribution is constructed by extracting at random a configuration X, called root, and giving the 
same probability to all the choices of the clauses for which X is a solution. Then, if we choose at random a set of K 
indices (ii, • • • , zat), and consider all the possible 2^ equations we can construct with these indices, we see that only 
one of the choices is not allowed (e.g. if the configuration X is such that {xi^ , ■ • • , Xi,^) = (!,••• , 1), only the choice 
Xj^ V • • • V Xi^ is not allowed). 

The probability of violating \n\ clauses can be computed as follows. For simplicity we choose the root X = (l,---,1). 
Then we want to know how many clauses are violated when we fiip one variable, e.g. Xi 0. The clauses that are 
violated have the form 

xi V X,, V • • • V Xi^ (23) 

The probability p{\n\) that \n\ such clauses appear in a formula is a binomial distribution with parameter p given by 
the product of the probability that the variable xi appears in the clause, which is K/N, times the probability that 
the signs of the variables in the clause make it unsatisfied by the flipped assignment, which is 1/(2^ ~ !)• 



K 1 

The number of equations is M = aN and they are independent so the probability of violating |n| equations is 



p{\n\) 



In a generic root X almost half of the variables are TRUE, giving rise to a positive field, while the other half are 
FALSE and correspond to negative fields; then we have 



This distribution differ from (22) by e 



^l"l 

Ppiantin) = e"T(5„o + 2]^^~^(-^ ~ ' ^^^^ 

-0(a) 



B. Correlation between fields and occurrences of the negations 

Most variables arc typically subject to strong fields, of the order of a in absolute value, in ground state assignments. 
We now show that the sign of the field z associated to a variable, say, x, is strongly correlated to the numbers of 
occurrences of literals x and x in the formula. 

Consider the cavity derivation of the self-consistent equations for the fields exposed in Section TUB. Suppose z > 0, 
and require the formula to be satisfiable {u oo). Then the number of messages coming from ±-type clauses are 
m_ = 0,m+ > 1. We define the average values of the number of ±-type clauses, {i±)z>o^ ^ follows: 



We get 



(^+).>o 



and finally 



En^+>lPM^("^+)PM(0) 
T,m+>lPM{m+)pM{0) 



(27) 



a'K 
a'K 



1 - (1 - q)e- 

1-e-S 



aK 



2 1-2 



-K 



+ e 



-0{a) 



aK 1 - 2-(^-i) 



(28) 



1-2 



-K 



-0(a) 



2>0 



z>0 



2>0 



2^-1 



+ 6 



-0{a) 



(29) 
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The average value of the bias between the numbers of positive and negative occurrences found for Vsat coincides with 
its counterpart for Vpiant a-t large ratios a. Consider again the planted distribution with respect to X — {1, ■ ■ ■ , 1). 
It is easy to show that variables occur more frequently non-negated [12]. Indeed, if x enters a given clause C, there 
are 2^ — 1 possible assignments of the negations, among which 2^~^ contain x and 2^~^ — 1 contain x (because the 
assignment in which all variables are negated is forbidden). Then it is clear that 

(^+) plant ^ i^-) plant _ 1 ^^q^ 
i^+)plant + i^-)plant — ^ 

as in (29). 

C. Relative entropy 

The relative entropy of Vpiant to Vsat is given, using (1), by 

<T=-Y1 log = - + ^ log 2 + log7V> - J2 T'satiF) log A/; [F] (31) 

where Meat denotes the number of satisfiable formulas. We have 



7V;„t = e^'^W X where M= f^j2^ 

is the total number of formulas. Using po ^ /2, Q{po) ^ 7 + 0{e~^), 

PQe^(po) ^ 1/2 + O(e-T), we get 



(32) 



a;(0) ^ log 2 + a log (^1 - + ^76"^ + 0(6"^) . (33) 

The value of the number of formulas sharing a common root, TV/, was given in Section II. The last term in (31) 
represents the average entropy of satisfiable formulas. It is bounded from above by -/Vpolog2 ~ Ne~^, because po 
is an upper bound to the fraction of variables that can change values from solution to solution (inside the unique 
cluster) . 

Gathering all contributions we get the following expression for the relative entropy, valid for large ratios a, 

a = iiV7 e-''' + 0{Ne-^') . (34) 
Hence a is extensive in A'', and decreases exponentially with a. 

V. FINITE ENERGY RESULTS 

The previous results extend to formulas having a small minimal fraction of unsatisfied clauses. This point is 
interesting since the relationship between approximation hardness and average-case complexity can be deduced from 
a weaker form of hypothesis 1 [9] , 

Hypothesis 2 For every fixed e > 0, for a arbitrarily large (but independent of N ), there is no polynomial time 
algorithm that on most 3- SAT formulas outputs typical, and never outputs typical on 3-SAT formulas with (1 — e)M 
satisfiable clauses. 

If we choose 1/ to be a large, finite number we find from the above replica calculation that the ground state energy 
(C12) dominating the integral (5) becomes 

eoM-^[l + 0{^'e-^)]e-^ (35) 

for large a. As in the ^ 00 case, most cavity fields arc non zero and typically of the order of a. In addition, using 
the calculation of Section IV B, wo can extend the calculation of the average difference between the number of ± 
occurrences of a variable with positive field, see Section IV B, to the case of large but finite v with the result, 



(36) 
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to first order in e ^ . Eliminating v between (35) and (36) we obtain 



'2>0 



^+)z>0 + (^-) 



z>0 



2^-1 



eK2 



K 



1 



2^+1-2 aK 



(37) 



to first order in tlie ground state energy density, e. This suggests that also formulas that are not exactly satisfiable 
but have few violated clauses (e -C 2~^^^ /K) can be detected by the WP algorithm. A consequence is that a weaker 
version of Hypothesis 2 in which "never" is replaced with "with probability p" should also be false for any p > 0. 

We have chocked the validity of this prediction on the following distribution of formulas, referred to as VpfJ^t 
hereafter. Pick up uniformly at random a configuration X of the variables, and choose M times independently a set 
of K indices uniformly over the (^) possible ones to build M clauses. For the first E clauses, the negations of the 
variables arc chosen such that the clause is violated by X (there is only one such assignment), while for the remaining 
M — E clauses the negations are chosen such that the clause is satisfied in X (there are 2^ — 1 such assignments and 
we choose one of them at random, as in the planted distribution). A simple calculation similar to the one of Section 

IV C shows that the relative entropy, cr(e = ^), of PpfJ^^^ and Vunif constrained to formulas with ground state energy 

E is (7(e) = N[e-^("\l + 0(e)) + ©(e^)]. Thus both distributions are similar in the large a limit, at least for small 
enough e. 

Prom a numerical point of view we extracted 3-SAT formulas from V^fJ^f with N = 200 variables, M = 2000 (i.e. 

a = 10) and studied the convergence of the WP algorithm as a function of E. When the algorithm converges, it returns 
a partial assignment of the variables [10], the unassigned variables having a zero cavity field. Without entering into a 
detailed numerical investigation, we roughly observed that for E < 10 the algorithm behaves essentially as for E = 0: 
it converges after few iterations, and in the returned partial assignment most of the variables (~ 197 ~ iV(l — e~'')) 
have the same value they have in the reference configuration X, and the rest of the variables are unassigned. After 
optimization over the unassigned variables, the energy of the resulting configuration differs from E hy ^ iVe^'' ^ 3 
at most. Note that E = 10 corresponds to e = 10/200 = 0.05 and is compatible with the value Cc ~ 2~^~^^/K ~ 0.08 
we found above (37) (there are corrections proportional to N~^^^). Above ~ 15 the probability of convergence 
decreases, and the number of unassigned variables increases, but when the algorithm converges and one optimizes over 
the unassigned variables, the resulting configuration has an energy close to £^ by ~ 3. Above £^ ~ 50 the algorithm 
almost never converges. Finally, it is interesting to observe that, when the algorithm converges its does so after 
^ log TV ^ 6 iterations, as predicted in [10] for E = 0, independent of the value of E. If convergence is not attained 
after ~ 10 iterations, it is very likely that the algorithm will not converge in the following iterations. This allows one 
to put a cut-off on the number of required iterations a priori. 



VI. CONCLUSIONS 



The present work supports the claim that satisfiable formulas from the uniform distribution can be recognized in 
polynomial time with probability close to unity provided the ratio of clause-per-variable is made large enough. In 
other words WP should be efficient to solve the random iiT-SAT problem at large a. This claim comes from the 
closeness of the two distributions Vsat and Vpiant for large but finite ratios a. More precisely both distributions 
produce formulas that (1) have a single cluster of solutions, in which (2) a large fraction 1 — e^'^(") of variables are 
strongly constrained (they have the same value in all solutions and a cavity field 0{a)) and a small fraction 6"*^^"^ is 
free to change its value (zero cavity field). Moreover, (3) as shown in section IV B, a positively constrained variable 
X (i.e. TRUE in all the solutions) is very likely to appear more times as x than as x in the formula. The efficiency 
of WP on Vpiant relies on these properties, and therefore extends to Vaat, then to Vunif once a cut-off (of the order 
of log A/') is imposed on the number of iterations. Furthermore these results extend to the case of a small but finite 
energy. Formulas with a minimal fraction of unsatisfied clauses larger than zero but much smaller than (the 
typical value at large a) can be recognized with large probability by WP in polynomial time. 

Yet the above findings are somewhat unsatisfactory for the following reason. It is easy to repeat the statistical 
mechanics calculations presented here for other Boolean functions expressing the truth values of clauses from the 
variables e.g. the XORSAT model [13]. The outcome is that at large ratios properties (1) and (2) hold quite generally 
but property (3) does not. Hence while from a probabilistic point of view the solution spaces of satisfiable SAT and 
XORSAT formulas far above the threshold are similar, they are not from an algorithmic point of view. More precisely 
WP cannot find out whether a XORSAT formula is typical (and has a minimal fraction of unsatisfiable clauses close to 
i) or exceptional (minimal fraction e -C ^). It would be thus interesting to devise an algorithm capable of performing 
this task. What implications this would have on hypothesis 2 remains to be clarified too. 
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APPENDIX A: REPLICATED FREE ENERGY 



Here we sketch the derivation of the replicated free energy following [5]. The partition function can be written as 
^{(^) — Sjf rii^i where ei{X) = 1 if the clause Cj =TRUE in configuration X and e^^ otherwise. Then 



M M 

ZW= J2 Yle,{Xr)---eiiXr,)= J2 Y[e,{Xi)---ei{X„)= ^ [e(Xi) ■ • • e(X„) 



Xi---Xri i=l 



Xi - X„ i=l 



Xi---X„ 



(Al) 



as the clauses are all chosen independently and with the same probability distribution. It is convenient to represent 
the variables Xi as spins, i.e. Xi = -i-^ ai = —1 and Xi = 1 -i-^ ai = 1; then erf denotes the value of the spin at site i 
for replica a, (fi is the n-component vector of the replicas of site i, af_ is the A'^-component vector of the configuration 
of replica a, and a is the full replicated configuration. Then we can compute 



K 



^ 1,N -1,1 n C 

\K/ ii< - <ijc qi - qK a=l K 



(A2) 



where the variables correspond to the random choice of the negation in the clause C (g^ = 1 means that the variable 
Xi^ is negated in C). To leading order in N we can neglect the constraint that all the fs have to be different, and 

replace (^)-^ EI'^...<.. with AT"- E.^^::. 
Introducing the order parameter 



1 

p(r|a) = ^En'^[^"'<]' 

i—l a—1 

(7i ~ T, we can write 



N 

i—l a—1 

that counts the number of sites i such that (Ti ~ f, we can write 



where 



Finally we write 



-1,1 n 



'^^)=^ E n 



qi --qK a=l 



K 



= f 5c{f) e^"'°^^--i--^i^"(^^^--"(^^^^^^^'---'"")En'^['=(^)-'='(^^^^^ ' 



and observing that 



En^[^(^)-'^(^i^)] = li: 



[Nc{f)]\ 



-^ExC(r)log c(r) 



we finally obtain 



„JV:r[c(r),n,/3] 



Z"[/3] = / rfc(f) 
JO 

J^[c{t), n, /3] = - ^ c(f) log c(f) + a log 



^ c(fi)---c(fRr)£:(fi,--- ,fRr) 



(A3) 



(A4) 



(A5) 



(A6) 



(A7) 



(A8) 
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The partition function (A8) can then be evaluated by a saddle-point, and the saddle-point value of c(t) is the average 
of the order parameter p{f\a). For symmetry reasons we expect that c(t) = c(— f) at the saddle point so the average 
over the signs (gi, • • • ,qK) in (A8) can be dropped setting = 1. 
The replica symmetric ansatz amounts to choose 



c(f) = C 



dzR{z) 



e 2 



[2cosh(/?^/2)]» ' 



(A9) 



where the last expression is a rcparametrization of c(t) in terms of a new function R{z) thus defined, and which must 
satisfy R{z) = R{—z). The normalization X)^c(f) = 1 implies / dzR{z) = 1. Substituting in (A8) we get, in the 
limit /? — »■ 00, n ^ 0, 1/ = n/3. 



e^--+-2-\-\^{x)\og<p{x)+a\og j dz^ . . .dzKR{zi) . . . R{zK)e 



where 



min(l, zi, . . . , Zif) if z^- > Vj 
otherwise 



(AID) 

(All) 
(A12) 



APPENDIX B: SADDLE-POINT EQUATION 

Differentiating (AID) with respect to R{z), with the constraint / dzR{z) = 1, we get (z = [z, zi^ - • ■ , zk))- 

S 







where 



SRiz) 

/dxdx 



r[R{-),v]+\ 



i-\-^v\x\—ixz— ^ 



V[R{- 



j R{z') dz' - 

1^ p-\-oo 

M^l [1 + log ^{x)] + y_ dz2... dzK R{Z2) . . . R{zK)e'"'^-^ + A 



(Bl) 



/+00 
dz,...dzk R{z,) . . . R{zk)e-'^^'"- ''''I 
-oo 



(B2) 



The function R{z) is even, R{z) = R{—z): in principle we should add a Lagrange multiplier to enforce this constraint, 
however this is equivalent to consider the equation above for z > only. 

In the last term, using the normalization of R{z) and the definition of $ we can write 



/oo -1 /"OO 

dz2... dzK R{Z2) . . . i?(z,f )e'^*(^) = 1 - ^ + dz2... dzK R{z2) . . . i?(z/f )e'^*(^) 

= 1 - + J ^l^e-"'"^"^^'*)-"" dz2... dzK R{Z2) ■ . . i?(z^)e*^™(^' 



(B3) 



,Z2,--- ,Zk) 



Defining 



Q{x) 



f 

Jo 



dz2 ■ ■ ■ dzK R{z2) ■ ■ ■ R{zK)e 



ix min(l,22r" i-^k) 



and using the relation min(z, x) = —^[\z — x\ — z — x] , the last integral in (B3) can be written as 



/ 



dxdx 



-1/ min(2:,x) — 



Q{x) = j 



dxdx 



-ixz-\-ixx— z-\- ^ \ x 



q(^^+-J^= j dxK{z,x)Q 



(B4) 



(B5) 



having defined the kernel K{z^x) = / |f- 



^1, that appears also in equation (Bl) for z > 0; note that 



/ dxK{z,x) = 1. The saddle point equation (Bl) then becomes, for z > 0: 

aK 







A - 1 - log<^(a;) + 



V[R{.)] 



(B6) 
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A solution of this equation is obtained when the term into curly brackets vanishes. Inverting Eq. (All), R{z) = 

J |£gjxz+f Izl^^^^^ expressing ip{x) using (B6) we get 



R{z) 



+ OC 



Substituting 



-exp|z,xz + -N + A-l + ^p^ 



+ 00 



R{z) = rn S{z - n) 



(B7) 



(B8) 



in (B7) we obtain the coefficients: 



B = 



l+(i^)"(e-_l) 

where the denominator of B is 'D[R{-)] from Eq. (B2) and In(x) is the modified Bessel function of order 



(B9) 
(BIO) 



APPENDIX C: GROUND STATE ENERGY 



We want to compute the ground state energy 



d 



1 j ||.| ^(^)iog(^(2.)_[i + iog^(^)] J^°° dz e-'^'-^^^'^lzlRiz) 



V[R{-)] 

We can use the saddle point equations (Bl) and (B7) to eliminate the integrals dx dx and obtain: 

_ aK f°° 

eo{v) = - dz zR{z) + —— / dz2-.- dzx 
Jo 4 Jo 

R{zi) . . . R{zk) 



R{z2)...R{zk) . ,^ 

V{R{-)\ "^'"(1^^2,...,.^)+ 



+ ay^ dz....dz^ ^^^^^^ 
Using Eq. (6) for R[z) we have 



K 



min(l, 22, • • • , Zk) + (1 - -ftT) min(l, 2i, . . . , Zk) 



V min(l,2:i , -.zk) 



a 1 



K 



2 1 + (1^)^ (e-. _ 1) V 2 ; 1 + (1^)^ (e-. _ 1) 



and from the expressions (B9) and (BIO) for r„ and B we obtain 

d , ^, , aKBl + rn i„ K\ „ 1 — rn 1, 

e-o{^) = —^\ogX{aKB, v) + — ^ e^^ + a[l--jB e'^' 

+ 00 

I{z,v) = e^'^l"l/n(^) 

n= — 00 

For > this sum is always converging, as can be seen from Eq. (D7). 



(Cl) 

(C2) 
(C3) 
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1. The limit v ^ oo 



We are interested in the limit — > oo as in this hmit eo(i') — * as we will show. Let us define e = e " and from 
(BIO) write: 



G 



aKBe^" aK ( 



1— m 
2 



, K-1 



aK {^) 



K-l 



K 



1 - e- 



(^) 



1 -e 



2g 1 - rp 
aK 2 



2 l-(i^)^ 

Using the small- 2: expansion of the Bessel functions In{z) (n > 0) 



In{z) 



2"n! 



1 + 



and J_„(z) = In{z) we have, using the identities 

V f^ = 2e«-l, 
^ h ! 



4(n+ 1) 



+ 0(2*) 



= 2Ge^ -G^-2G 



that 



1 + 



n +1 



2e 



n| + l)! 

l + e(2Ge'^-G2-2G) + 0(e2). 

l + eG' + 0(e") 

2eG - 1 + e(2GeG - G^ - 2G) + 0{e^) ~ 26^-1 + e(2GeG - G^ - 2G) + 0(e2) 



The equation for ro is from (B9), (C5) 

Io{2Gei-^) 



2e5 - 1 \ 



1 + e^^ + 



2e5 - 1 
= Fo{ro) + eF,{ro) + 0{e^) . 
The solution to the previous equation is 



2eS^,^—^-2ge^ + g' + 2g 
aK 2 



ro = Po + epi 
1 

Po 



Pi 



2e^ip«) - 1 ' 



To write the energy we need to compute 



G'^ 



1 



^ (n- 1)! 

n=l n=l ^ ' 

The energy (C2) is then given by, neglecting O(e^): 



eG^ 
n + 1 



Ge° + eG(l - + Ge'^) + 0{e'^ 



eo{v) = -- 



Ge^ + tG{l - e° + Ge°) ^ 1 + ro 



+ G- 



+ e(--l)G 



,1-ro 



(C4) 



(C5) 



(C6) 



(C7) 



(C8) 



(C9) 



(CIO) 



(Cll) 



2eG - 1 + e(2GeG - G^ - 2G) 

Given that eo(i^) = —d^F and that ro is the solution of dr^F = the term epi in ro should not contribute to eo(z^) 
at first order in e. Then we can write 

eo{u) = -ge^po^l-e—- 



Po 



1 



Po 



+ e [e-^ - 1 + a] - epo [2^6^ - g^ - 2g] | 



+ gpoe 

72 



1 - e 



2g 1-po 

aK 2 



+ e 



K 



A- Po 



(C12) 



= ege^po _ -i + g]+po [2ge^ -g^-2g]+^ 

{ aK ^ Pot 



poe^ \K 



1 - Po 



and eo{v) ~ e for large v. The latter expression is complicated, but it simplifies considerably in the limit a 00. 



APPENDIX D: FREE ENERGY OF THE RS SOLUTION 



Finally, we can compute the free energy corresponding to the solution to (B9). We begin by calculating 



/oo 



S{z — n) = 



aOtKB cos X 



I{aKB,v) I{aKB,v) ' 

Then using ip{x) log ip{x) = -§- ^{xY we rewrite the first term in (AlO) as 

L " Jp=i 



dp 

d_ 
dp 

d_ 
dp 



olKB cos X 

X{aKB)P 



dx 



62 



p olKB cos X 



^^J I{aKB)pJ 277 
where all integrals are between — oo and +oo. The dx integral is of the form 

f +00 



cos(^a;) 



where 



/ dx 
— cos(xa;) ■i/'(cosa;) = ^ f„ d{n - x) 
-°° n=-oo 

/„ = / — e""V(cost) = - / dteP cos(nt) = /„(p aKB). 

Jo 27r TT Jo 



In this way, we obtain for the double integral dx dx 



d_ 
dp 



I{aKB,vy T{aKB,u) 
The energy term in the free energy is just alogT>[R(.)], and we obtain 

r{v) = -aKB ^ + log J(al^S, v) +a\og 

The sums can be expressed in terms of fast-converging series (for v > Qi): 



\ogI{aKB,v). 



K 



l(i.o)(2,j.) = 2cosh[ii^') e""°"H5-) _e-5''/o(z)-2cosh('i:/') ^e^^''"/„(z). 



APPENDIX E: EIGENVALUES OF THE STABILITY MATRIX IN THE RS SOLUTION 

Differentiation of the free energy (A8) gives 

_ d-'T 1 ^ ^ aK{K - 1) c(a3) • • • c(aK)g(a, r, ag, • • • , ok) 

dc{a)dc{T) c{a) J2s^-Sk ' ' ' c{aK)£{au- ■■ ,gk) 

[Effi-ffK c(cti) ■ • • c{aK)S{a-i_, ■■■ , ax)] ^ 
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with £ as in (A5). The function c(a) can be computed at the saddle point using (A9), (6) and (8); defining s = ^ a a 

we have 



c{a) ^ 



2,5 _ 1 - -J ■ - {^^-^Il^l' 1] + (1 - 1])} ' (E2) 

where the last equality holds for i/ — > oo. For large a, Q becomes large and the expression for c(a) further simplifies: 

c{a) = \5{\sll) + 0{e-"). (E3) 

This allows a straightforward calculation of the sums appearing in (El). In order to do this, we observe that 
£{a\, . . . ,5k)-i defined in equation (A5), is equal (in the limit /3 — > oo) to 1/2^ times the number of vectors in 
{— 1, 1}^ that are not equal to any of the columns of the matrix whose rows are the vectors cti, . . . , uk- Then, to o(l) 
in a: 

Vk = ^^^i) • • • ^^^^) "^(^1' • • • ' 

ffi,...,aK 

since the only terms that contribute to the sum are those with Oi = (1,1, ...,1) or (— 1,— 1,...,— 1), and the corre- 
sponding matrices have all the columns equal (so that there are 2^ — 1 vectors that are not equal to any column). In 
the same way we obtain 

Vk-i{S) = ^ c{d2)...c{dK) £{o,02,...,Bk) (E6) 

= 2"-^ X ^ X ^ [2- - (2 - 5i\s\, 1))] = 1 - ^ + ^ (E7) 
since if |s| = 1 all the columns are equal, while if |s| < 1 there will be two different column types, and 

T>k-2{^,t) = ^ c{a3)...c{aK) £{^,T,a3,...,aK) (E8) 

CT3,...,CTK 

= 2"-^ X ^ X ^ [2- - Dia,r)] = 1 - ^ (E9) 

where the function D{a,f) counts the number of different pairs, among the possible four (—1,-1), (—1,1), (1,-1) 
and (1,1), that actually occur in {(ct", r"), a = 1, . . . , n}. It is straightforward to verify the recursion relations 

VK=Y,c{a)VK-i{a) , (ElO) 
Vk-i{<}) = ^c(t) Vk-2{S,7) . (Ell) 

r 

Then we get, neglecting e"*^^*^^: 

^ ^ 2e%^ aK{K - 1)[2'' -D{a,T)\ 

eH[\sll] + {l-5[\s\,l])'^ 2^-1 
ai^^(,5[H,l] + 2^-2)(J[|t|,l]+2^-2) 
(2^ - 1)2 

This matrix is invariant under permutations of the replicas so it preserves the symmetry of the vectors under permu- 
tations. This means that it can be block-diagonalized in subspaces of given replica symmetry. 

First we have to take into account the constraint Yls'^A'^) = 1- This can bo done by considering .7-'[c((t)] = 
J^[l — c'{a), c'{a)] where c'{a) = c{a) for a ^ (-|-1, ■ • • , -|-1) = 1 and c'(<j) has no 1 component. Then it is easy to 
show that Hessian matrix of T with respect to c' is (a, r ^ 1): 



eSS[\s\,l] + {l-5Ul]) 

^ffi^M + Df^^i n(r^^ n(^rM -2)(^[|t|,i] -2) ^ 
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Let us start with the non-symmetric subspaces. In these subspaces, \s\ ^ 1 and \t\ ^ 1, then 

ML, = -2e^6ar - _ ^ ^ [1 + D{a, f) - D{1, r) - D{a, 1)] - 2 - ^^^3^ • (EM) 

The diagonal term is 0(e") while the off-diagonal part is 0{a). This means that even in the properly- symmetrized 
basis the matrix elements will have a diagonal part of 0(e") while the off-diagonal elements will be 0{a). Then it 
is easy to show (e.g. in perturbation theory) that the off-diagonal terms can change the eigenvalues at most by a 
quantity 0(q!2"), so it cannot change the sign of the eigenvalues. In this space the matrix M has then only negative 
eigenvalues and J- has a maximum. 

In the symmetric subspace we can use the same argument for all the eigenvalues but the diagonal element corre- 
sponding to a,T = —1 which is not 0(e"). However we can write 

ML, = -2e^5sf{l - 5[\s\, 1]) + Vg^ , (E15) 

and treat F as a perturbation. In the dangerous subspace a = f = —1 where the diagonal part has zero eigenvalue, 
the matrix element of the perturbation is 

V_i^_l = M^j _j. = M_j- _j - M_j- j- - Mj-__j- + Mj- = -4 < , (E16) 
so the eigenvalues of M' are all negative for a large enough. 



APPENDIX F: SOLUTIONS WITH NON-INTEGER FIELDS IN THE v ^ 00 LIMIT 

We look for a solution with rational-valued fields, 

-t-00 

(Fl) 



R{z)= Y: ^n6[z-^j 



where p is an integer > 1. As the fields are expected to be integer- valued, the existence of such a solution would be 
an indication for an instability of the replica symmetric solution. We plug this ansatz in the self-consistent equation 
for R (B7) and find a self-consistent equation for the p variables ro, ri, . . . , rp_i. 

-l-oo / p \ 

J2 rn cos{x-) e-"l"l/(2p) =expU + aKj2A cos{x-) e-"«/(2p) (f2) 

n= — oo ^ \ q=l ^ ) 

which must be true for any x. In the above equation we have defined 

K—\ I \K—\ I \K—\ { \K—\ I \K—\ 

^i = " :^"7^^ ,A = ("----^) -f--^) (2<,<p-i),...,A,^ ("7^-^) , (F3) 

1 — w'^ 1 — 1 — w'^ 

where w = {\— rQ)/2. To calculate the constant fi we set x = and send — > 00 to obtain 

1 ^ \ 

(F4) 



^ Tn ( ^ ^2*^"'° ) = exp [ /Z + ^ QJiT ^ AA . 
i=-oo ^ ^ \ 9=1 / 



In addition setting x = and sending v ^ 00 we have ro = e^. Combining the two equations above we obtain the 
self-consistent equation 

^0 = ^-^T (F5) 

2exp(f/^^)-l 

which is the same equation as in the case of integer fields only {q = 1). The equation for the weight of the smallest 
non zero field reads 

ri=ro^, (F6) 
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and for ri 7^ can be written equivalently, with y = ri/'w, as 

^ =aK [--w] . F7 

l-(l-y)^-i \2 J ^ ^ 

As ro ranges from to 1, w ranges from to ^. Moreover, note that ri < w because w is the probabiUty of having a 
positive field; then y ranges from to 1. When a is large the r.h.s. is ^ 'ye~'^ which is very small while the l.h.s. is 
larger than 1/{K — 1) (minimal value in y = 0). Therefore the latter equation has no solution and the only solution 
to (F6) is n = 0. 
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